Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(scheduler): update otel-collector-config after jaeger endpoint does not supported #5170

Closed
wants to merge 1 commit into from

Conversation

ducminhle
Copy link

@ducminhle ducminhle commented Dec 12, 2023

What this PR does / why we need it:
With existing config: otel-collector-config.yaml, I got an error when try to run the make deploy-local, otel-collector container crashed with logs:

Error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:

* error decoding 'exporters': unknown type: "jaeger" for id: "jaeger" (valid values: [file prometheusremotewrite skywalking splunk_hec cassandra dataset influxdb kafka awss3 otlp azuremonitor googlecloudpubsub loadbalancing zipkin logging datadog elasticsearch sapm sentry sumologic tanzuobservability tencentcloud_logservice debug carbon clickhouse mezmo prometheus azuredataexplorer awsxray f5cloud googlecloud pulsar syslog alibabacloud_logservice coralogix logzio loki opensearch awskinesis awscloudwatchlogs awsemf dynatrace googlemanagedprometheus honeycombmarker instana logicmonitor otlphttp signalfx opencensus])
2023/12/11 06:42:49 collector server run finished with error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:

* error decoding 'exporters': unknown type: "jaeger" for id: "jaeger" (valid values: [file prometheusremotewrite skywalking splunk_hec cassandra dataset influxdb kafka awss3 otlp azuremonitor googlecloudpubsub loadbalancing zipkin logging datadog elasticsearch sapm sentry sumologic tanzuobservability tencentcloud_logservice debug carbon clickhouse mezmo prometheus azuredataexplorer awsxray f5cloud googlecloud pulsar syslog alibabacloud_logservice coralogix logzio loki opensearch awskinesis awscloudwatchlogs awsemf dynatrace googlemanagedprometheus honeycombmarker instana logicmonitor otlphttp signalfx opencensus])

I found the problem: opentelemetry-collector-contrib has removed jaeger and jaegerthrifthttp exporters from this commit: open-telemetry/opentelemetry-collector-contrib#26546. We need to update the endpoint to fix this issue (open-telemetry/opentelemetry-collector-contrib#15291 (comment))

Which issue(s) this PR fixes:

Special notes for your reviewer:

@seldondev
Copy link
Collaborator

Hi @ducminhle. Thanks for your PR.

I'm waiting for a SeldonIO or todo member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

@sakoush sakoush self-requested a review December 12, 2023 20:31
@sakoush sakoush added the v2 label Dec 13, 2023
Copy link
Member

@sakoush sakoush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ducminhle many thanks for your contributions, this looks good in general. I left one comment on whether we still need to expose port 14250?

This PR covers docker compose installations, however the same issue you are fixing exists potentially with installations in k8s, e.g. check: tracing/k8s/otel-collector.yaml. If you are willing to fix it as well in k8s that would be great, but happy to leave it to a separate contribution.

Please also outline the tests that you have done and any screenshots you might have. e.g. to make sure that traces appear in jeager ui after this change.

scheduler/all-base.yaml Outdated Show resolved Hide resolved
@ducminhle ducminhle changed the title fix(scheduler): update otel-collector-config after jaeger endpoint do… fix(scheduler): update otel-collector-config after jaeger endpoint does not supported Dec 13, 2023
@ducminhle ducminhle force-pushed the v2 branch 3 times, most recently from ff8a241 to 1b10b4c Compare December 14, 2023 19:21
@ducminhle
Copy link
Author

ducminhle commented Dec 15, 2023

Thank you, @sakoush. I have solved your comment. Here is some screenshot, please let me know if you need more information.
image
image
image
image
image

This PR covers docker compose installations, however the same issue you are fixing exists potentially with installations in k8s, e.g. check: tracing/k8s/otel-collector.yaml. If you are willing to fix it as well in k8s that would be great, but happy to leave it to a separate contribution.

In tracing/k8s/Makefile, I checked that we still use jaeger-operator v1.33.0 (I'm not sure this version is removed jaeger). My laptop does not have enough resources to run k8s, I will check it later and create new PR if we need to update.

@ducminhle ducminhle requested a review from sakoush December 15, 2023 04:58
@seldondev
Copy link
Collaborator

@cin-otis: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

@ducminhle
Copy link
Author

/ok-to-test

@seldondev
Copy link
Collaborator

@ducminhle: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

@ducminhle ducminhle requested a review from lc525 as a code owner January 26, 2024 18:18
lc525 added a commit that referenced this pull request Feb 10, 2024
- fixes dataflow engine logging errors by specifying the otel exporter protocol

the updates are needed because the opentelemetry-java-instrumentation library requires a http(s) URI for the otel collector endpoint, regardless of the actual protocol. As the default for us is grpc, explicitly set the OTEL_EXPORTER_OTLP_PROTOCOL environment variable on dataflow pods

- fixes otel-collector config to remove deprecated jagger exporter (jagger now supports otel directly).

add the OTEL_EXPORTED_OTLP_PROTOCOL key to the seldon-tracing configMap and update the operator, crds and helm charts to support getting the value for this key from tracing config, similarly to how OTEL_EXPORTER_OTLP_PROTOCOL is fetched

- update versions used by ansible for jagger and opentelemetry-operator
- port 14250 no longer needs to be exposed under any config
- fix dependency ordering for dataflow/gradle.

previous ordering caused kafka-streams not to be able to find the slf4j logging provider
this lead to logs produced by kafka-streams not being recorded

Fixes #
Internal issue references:

#INFRA-568 Jagger latest is crashing
#INFRA-464 Otel is not able to parse its config (deprecated exporters)

Public issues:

otel-collector-1 container not starting: #5189 
related PR with partial otel functionality: #5170
@lc525
Copy link
Member

lc525 commented Feb 10, 2024

I believe this is now solved via #5291, which includes changes required for things to work on k8s, as well as in local deployments. Many thanks for your contribution, and for bringing the issue to our attention.

@lc525 lc525 closed this Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants